Prediction of Thermostability from Amino Acid Attributes by Combination of Clustering with Attribute Weighting: A New Vista in Engineering Enzymes

نویسندگان

  • Mansour Ebrahimi
  • Amir Lakizadeh
  • Parisa Agha-Golzadeh
  • Esmaeil Ebrahimie
  • Mahdi Ebrahimi
چکیده

The engineering of thermostable enzymes is receiving increased attention. The paper, detergent, and biofuel industries, in particular, seek to use environmentally friendly enzymes instead of toxic chlorine chemicals. Enzymes typically function at temperatures below 60°C and denature if exposed to higher temperatures. In contrast, a small portion of enzymes can withstand higher temperatures as a result of various structural adaptations. Understanding the protein attributes that are involved in this adaptation is the first step toward engineering thermostable enzymes. We employed various supervised and unsupervised machine learning algorithms as well as attribute weighting approaches to find amino acid composition attributes that contribute to enzyme thermostability. Specifically, we compared two groups of enzymes: mesostable and thermostable enzymes. Furthermore, a combination of attribute weighting with supervised and unsupervised clustering algorithms was used for prediction and modelling of protein thermostability from amino acid composition properties. Mining a large number of protein sequences (2090) through a variety of machine learning algorithms, which were based on the analysis of more than 800 amino acid attributes, increased the accuracy of this study. Moreover, these models were successful in predicting thermostability from the primary structure of proteins. The results showed that expectation maximization clustering in combination with uncertainly and correlation attribute weighting algorithms can effectively (100%) classify thermostable and mesostable proteins. Seventy per cent of the weighting methods selected Gln content and frequency of hydrophilic residues as the most important protein attributes. On the dipeptide level, the frequency of Asn-Glu was the key factor in distinguishing mesostable from thermostable enzymes. This study demonstrates the feasibility of predicting thermostability irrespective of sequence similarity and will serve as a basis for engineering thermostable enzymes in the laboratory.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Engineering Thermostable Enzymes; Application of Unsupervised Clustering Algorithms

There is a high demand for engineering thermostable enzymes in some industries; especially in paper industries to use environmental friendly enzymes instead of toxic chlorine chemicals. Hence, understanding protein attributes involved in enzyme thermostability is important. Herein, the most important protein features contributing to enzyme thermostability was searched by using data mining algor...

متن کامل

Nature Precedings Title Amino acid features: a missing compartment of prediction of protein function

Enormous computational efforts have been carried out to predict structure and function of protein. However, nearly all of these efforts have been focused on prediction of function based on primary nucleic acid sequence or modelling 3D structure of protein from its nucleic acid sequence. In fact, it seems that amino acid attributes, which is an intermediate phase between DNA/RNA and advanced pro...

متن کامل

Sensitivity Analysis of Simple Additive Weighting Method (SAW): The Results of Change in the Weight of One Attribute on the Final Ranking of Alternatives

Most of data in a multi-attribute decision making (MADM) problem are unstable and changeable, then sensitivity analysis after problem solving can effectively contribute to making accurate decisions. This paper provides a new method for sensitivity analysis of MADM problems so that by using it and changing the weights of attributes, one can determine changes in the final results of a decision ma...

متن کامل

استفاده از نگرش تحلیل مؤلفه‌های اصلی برای وزن‌دهی ویژگی‌های آماری، اقلیمی و جغرافیایی حداکثر بارندگی 24 ساعته و تحلیل مکانی خوشه‌بندی (مطالعه موردی: حوضه دریاچه ارومیه)

Regionalization is one of the useful tools for carrying out effective analyses in regions lacking data or with having only incomplete data. One of the regionalization methods widely used in the hydrological studies is the clustering approach. Moreover, another effective factor on clustering is the degree of importance and participation level for each of these attributes. In this study, it was t...

متن کامل

Classification of Lung Cancer Tumors Based on Structural and Physicochemical Properties of Proteins by Bioinformatics Models

Rapid distinction between small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) tumors is very important in diagnosis of this disease. Furthermore sequence-derived structural and physicochemical descriptors are very useful for machine learning prediction of protein structural and functional classes, classifying proteins and the prediction performance. Herein, in this study is the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2011